position evaluation
Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout
Gundawar, Atharva, Li, Yuchao, Bertsekas, Dimitri
In this paper we apply model predictive control (MPC), rollout, and reinforcement learning (RL) methodologies to computer chess. We introduce a new architecture for move selection, within which available chess engines are used as components. One engine is used to provide position evaluations in an approximation in value space MPC/RL scheme, while a second engine is used as nominal opponent, to emulate or approximate the moves of the true opponent player. We show that our architecture improves substantially the performance of the position evaluation engine. In other words our architecture provides an additional layer of intelligence, on top of the intelligence of the engines on which it is based. This is true for any engine, regardless of its strength: top engines such as Stockfish and Komodo Dragon (of varying strengths), as well as weaker engines. Structurally, our basic architecture selects moves by a one-move lookahead search, with an intermediate move generated by a nominal opponent engine, and followed by a position evaluation by another chess engine. Simpler schemes that forego the use of the nominal opponent, also perform better than the position evaluator, but not quite by as much. More complex schemes, involving multistep lookahead, may also be used and generally tend to perform better as the length of the lookahead increases. Theoretically, our methodology relies on generic cost improvement properties and the superlinear convergence framework of Newton's method, which fundamentally underlies approximation in value space, and related MPC/RL and rollout/policy iteration schemes. A critical requirement of this framework is that the first lookahead step should be executed exactly. This fact has guided our architectural choices, and is apparently an important factor in improving the performance of even the best available chess engines.
Temporal Difference Learning of Position Evaluation in the Game of Go
The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spa(cid:173) tiotemporal interactions that make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training networks to evaluate Go positions via tem(cid:173) poral difference (TD) learning. Our approach is based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networks trained by self(cid:173) play alone.
GamePad: A Learning Environment for Theorem Proving
Huang, Daniel, Dhariwal, Prafulla, Song, Dawn, Sutskever, Ilya
In this paper, we introduce a system called GamePad that can be used to explore the application of machine learning methods to theorem proving in the Coq proof assistant. Interactive theorem provers such as Coq enable users to construct machine-checkable proofs in a step-by-step manner. Hence, they provide an opportunity to explore theorem proving at a human level of abstraction. We use GamePad to synthesize proofs for a simple algebraic rewrite problem and train baseline models for a formalization of the Feit-Thompson theorem. We address position evaluation (i.e., predict the number of proof steps left) and tactic prediction (i.e., predict the next proof step) tasks, which arise naturally in human-level theorem proving.
Understanding AlphaGo
One of required skills as an Artificial Intelligence engineer is ability to understand and explain highly technical research papers in this field. One of my projects as a student in AI Nanodegree classes is an analysis of seminal paper in the field of Game-Playing. The target of my analysis was Nature's paper about technical side of AlphaGo -- Google Deepmind system which for the first time in history beat elite professional Go player, winning by 5 games to 0 with European Go champion -- Fan Hui. The goal of this summary (and my future publications) is to make this knowledge widely understandable, especially for those who are just starting the journey in field of AI or those who doesn't have any experience in this area at all. AlphaGo is narrow AI created by Google DeepMind team to play (and win) board game Go. Before it was presented publicly, the predictions said that according to our state-of-art, we are about 1 decade away from having system with AlphaGo skills (capability to beat a human professional Go player).
Temporal Difference Learning of Position Evaluation in the Game of Go
Schraudolph, Nicol N., Dayan, Peter, Sejnowski, Terrence J.
Computational Neurobiology Laboratory The Salk Institute for Biological Studies San Diego, CA 92186-5800 Abstract The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactionsthat make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training networks to evaluate Go positions via temporal difference(TD) learning. Our approach is based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networks trained by selfplay alone.A network with less than 500 weights learned within 3,000 games of 9x9 Go a position evaluation function that enables a primitive one-ply search to defeat a commercial Go program at a low playing level. 1 INTRODUCTION Go was developed three to four millenia ago in China; it is the oldest and one of the most popular board games in the world.